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CHAINING, INTERPOLATION, AND CONVEXITY 


RAMON VAN HANDEL 


Abstract. We show that classical chaining bounds on the suprema of random 
processes in terms of entropy numbers can be systematically improved when 
the underlying set is convex: the entropy numbers need not be computed for 
the entire set, but only for certain “thin” subsets. This phenomenon arises 
from the observation that real interpolation can be used as a natural chaining 
mechanism. Unlike the general form of Talagrand’s generic chaining method, 
which is sharp but often difficult to use, the resulting bounds involve only 
entropy numbers but are nonetheless sharp in many situations in which clas¬ 
sical entropy bounds are suboptimal. Such bounds are readily amenable to 
explicit computations in specific examples, and we discover some old and new 
geometric principles for the control of chaining functionals as special cases. 


1. Introduction 


A remarkable achievement of modern probability theory is the development of 
sharp connections between the boundedness of random processes and the geometry 
of the underlying index set. Perhaps the most fundamental result in this direction 
is the characterization of boundedness of Gaussian processes due to Talagrand. 


Theorem 1.1 ([16]). Let {Xt)teT be o. centered Gaussian process and denote by 
d{t,s) = (EjAt — the associated natural metric on T. Then 


E 


sup At 
teT 


^l2iT) := infsup^2"/^d(f,T„), 


where the infimum is taken over all sequences of sets T„ with cardinality |T„| 


< 2 ^". 


The quantity 72 (T) captures precisely what aspect of the geometry of the metric 
space (r, d) controls the suprema of Gaussian processes: it quantifies the degree to 
which T can be approximated by a sequence of increasingly hne nets Tn- While 
we quote this particular result for concreteness, the structure that is expressed 
by Theorem 1.1, called the generic chaining, extends far beyond the theory of 
Gaussian processes and has a substantial impact on various problems in probabil¬ 
ity, functional analysis, statistics, and theoretical computer science. An extensive 
development of this theory and its implications can be found in [16]. 

Theorem 1.1 provides a powerful general principle for the study of the suprema 
of random processes. However, when presented with any specific situation, it often 
proves to be remarkably difficult to control 72 (T) efficiently. Theorem 1.1 can only 
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give sharp results if one is able to construct a nearly optimal sequence of nets T„, 
a task that is significantly complicated by the multiscale nature of 72 ( 7 ’)- The 
aim of this paper is to exhibit some surprisingly elementary principles that make it 
possible to obtain sharp control of 72 (T) in various interesting examples, and that 
shed new light on the underlying geometric phenomena. 

There are essentially two general approaches that have been used to control 
72 (T). The simplest and by far the most useful approach is obtained by bringing 
the supremum over t G T inside the sum in the definition of 72 (T). This yields 

72(T)<^2”/2e„(T), 

n>0 

where the entropy number en{T) is defined as the smallest e > 0 such that there 
is an £-net in T of cardinality less than 2^ . This bound, due to Dudley [7], long 
predates Theorem 1.1 and has found widespread use. Its utility stems from the 
fact that controlling entropy numbers only requires us to approximate the set T 
at a single scale, for which numerous methods are available; see, e.g., [ 10 , 8 , 2 ]. 
Unfortunately, Dudley’s bound can fail to be sharp even in the simplest examples, 
such as ellipsoids in Hilbert space. In fact, the supremum of a random process 
on T cannot in general be understood in terms of the entropy numbers of T : one 
can easily construct two such sets with comparable entropy numbers on which a 
Gaussian process behaves very differently [14]. It is therefore a crucial feature of 
Theorem 1.1 that the use of entropy numbers is replaced by a genuinely multiscale 
form of approximation. The construction of such a multiscale approximation in any 
given situation is however a highly nontrivial task. 

The main approach that has been developed for the latter purpose is Talagrand’s 
growth functional machinery [16] that forms the core of the proof of Theorem 1.1. 
To show that 72 (T) is upper bounded by the expected supremum of the Gaussian 
process, the proof of Theorem 1.1 constructs nets T„ by means of a greedy parti¬ 
tioning scheme that uses the Gaussian process itself G{A) := E[sup(g^Xi] as an 
objective function. It turns out that the success of this proof relies on the properties 
of Gaussian processes only through the validity of a single “growth condition” of 
the functional G. If one can design another functional F that mimics this property 
of Gaussian processes, then the same proof also yields an upper bound on 72 (T) 
in terms of F{T). An important example of such a construction is the proof that 
72 (T) is strictly smaller than Dudley’s bound when T is a q-convex body [16, §4.1]. 
It is generally far from obvious, however, how a functional F can be designed, and 
successful application of this approach requires considerable ingenuity. 

In this paper, we develop a new approach that is intermediate between these 
two extremes. The central insight of this paper is that it is possible to improve 
systematically on Dudley’s bound without giving up the formulation in terms of 
entropy numbers. Of course, as was noted above, we cannot expect to improve on 
Dudley’s bound in a general setting in terms of the entropy numbers of T itself. 
Instead, we will show that when T is a convex set, the entropy numbers en{T) in 
Dudley’s bound can be replaced by the entropy numbers of certain “thin” subsets 
that can be substantially smaller than T. (The convexity assumption is not essential 
for our approach, but leads to a cleaner statement of the results.) 

To illustrate this idea, let us begin by stating a useful form of such a result. Let 
(A, II • II) be a Banach space, and let H C A be a symmetric compact convex set. 
We denote by || • ||b the gauge of B, and by || • ||]^ and || • ||* the dual norms on A*. 
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In this setting, we will always choose the distance d in the definitions of 72 (^) and 
en{B) to be the one generated by the norm d(x,y) := ||a; — y\\. 

Theorem 1.2. Let B C {X, || • ||) be a symmetric compact convex set, and define 

Bt := {y e B : 3 z e X* such that {z,y) = ||?/||s, ||^||b < 1, H^H* < t}. 

Then we have for any a > 0 

72(5) + 

® n>0 

The bound of Theorem 1.2 proves to be sharp in many situations in which 
Dudley’s bound is suboptimal, and often provides a simple explanation for why 
this is the case. At the same time, Theorem 1.1 is typically no more difficult to 
apply than Dudley’s bound, as the “thin” subsets Bt C B that appear in this bound 
can be found in quite explicit form. For example, if B is a smooth symmetric convex 
body in then it is a classical fact that V||a;||B is the unique norming functional 
for the norm || ■ ||b at the point x, so that we can simply write 

Bt = {yGB:\\V\\y\\Br<t}. 

Such expressions are readily amenable to explicit computations. 

One of the nice features of Theorem 1.2 is that the phenomenon that it describes 
arises in a completely elementary fashion. To understand its origin, let us sketch 
the simple idea behind the proof. The basic challenge in controlling 72 ( 5 ) is to 
approximate the unit ball of the norm || • ||b in terms of another norm || • ||. It 
proves to be useful to connect these two norms using an idea that is inspired by 
real interpolation of Banach spaces [4]. To this end, define Peetre’s AT-functional 

K{t,x) := inf{||i/||B + t\\x - ?/||} = || 7 r 4 (a;)||B + t\\x - TTt{x)\\, 

V 

where 7 rj(x) is any minimizer in the definition of K{t, x) (assume for simplicity that 
we work in a finite-dimensional Banach space to avoid trivial technicalities). It is 
easily seen that limt_>oo K{t, x) = ||a:||B, K{0, x) = 0, and -^K{t, x) = \\x — 7 rt(a:)|| 
(the latter follows by observing that ||a; — 7 rt(a;)|| is a supergradient of the concave 
function t i-A- K{t,x), so it must equal ■^K{t,x) a.e.; see Proposition 2.3 below.) 
We therefore obtain by the fundamental theorem of calculus 

poo _ 

Ikiu = / I|x - TTt{x)\\ X ^ 2”/2||a; - 7r2„/2(ai)||, 

n>0 

where the last step follows from a Riemann sum approximation of the integral. This 
leads immediately to the following observation: if we define the sets 

Bt := { 7 rt(a:) : x G B}, 

then we have shown that 

sup ^ 2 "’/^d(a:, B271/2) < 1 . 

In other words, we see that a natural chaining mechanism is in fact built into the 
real interpolation method: we automatically generate a multiscale approximation 
of B in terms of the sets Bt- In order to bound 72 ( 5 ), it remains to choose a finite 
net with the appropriate cardinality inside each of the sets Bt. (While it may not 
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be immediately obvious, the definition of Bt given in Theorem 1.2 is none other 
than the dual formulation of the definition of Bt as a set of minimizers.) 

It should be clear at this point that convexity is not essential in the construction 
using real interpolation: convexity only enters the proof of Theorem 1.2 in order 
to obtain the convenient formulation of the sets Bt. In section 2, we first prove 
a general form of Theorem 1.2 that is applicable in any metric space; we also 
formulate the results for more general 7 p-functionals that appear when the generic 
chaining method is applied to non-Gaussian processes. We then specialize to the 
convex setting and derive the dual formulation of Bt- In section 3, we illustrate 
the power of Theorem 1.2 in a number of explicit examples. We also illustrate by 
means of an example that Theorem 1.2 does not always give sharp results. 

Theorem 1.2 improves on Dudley’s bound by replacing the entropy numbers of 
B by the entropy numbers of the smaller sets Bt- A rather different improvement 
arises when B is q-convex, for which Talagrand shows that [16, §4.1] 


12{B) < 


_ n>0 


(9-l)/9 


This bound involves only the entropy numbers of the set B itself, and appears at first 
sight to be quite different in nature than Theorem 1.2. Nonetheless, we show in sec¬ 
tion 4 that this fundamental result is a direct consequence of Theorem 1.2. Roughly 
speaking, we will see that the q-convexity assumption forces the sets Bt to be much 
smaller than the original set B in the sense that en{Bt) S . 

In fact, it turns out there is nothing particularly special about uniform convexity: 
Talagrand’s result is a special case of a more general geometric phenomenon that 
will be developed in section 4. As another illustration of this phenomenon, we will 
show that Talagrand’s bound for q-convex bodies holds verbatim for f^-balls in 
Banach spaces with an unconditional basis for every 1 < q < oo. Note that such 
sets are only 2-convex rather than q-convex when 1 < q < 2, so that the behavior 
of lq-ha\\s is evidently not explained by uniform convexity. 

The connection between interpolation and generic chaining appears in hindsight 
to be entirely natural. Many generic chaining constructions (that appear in [16, 15], 
for example) have a flavor of interpolation, and even the multiscale notion of ap¬ 
proximation that is intrinsic to the definition of 72 (r) has appeared independently 
in interpolation theory in the study of approximation spaces [13, 6, 12]. To the 
best of the author’s knowledge, however, the results of this paper are the first to 
explicitly develop this connection. It would be interesting to understand whether 
broader interactions exist between these areas of probability and analysis. 


2. Chaining, Interpolation, and Convexity 

The aim of this section is to develop the basic connections between chaining, 
interpolation, and convexity that lie at heart of this paper. In section 2.1, we 
develop an abstract chaining principle that holds in any metric space. In section 
2.2, we specialize to the convex setting and complete the proof of Theorem 1.2. 

2.1. Chaining and interpolation. In this section, let (X, d) be any metric space. 
We begin by defining formally the notions of entropy numbers and Talagrand’s 7 p- 
functionals. The case p = 2 arises in the context of Gaussian processes together 
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with the associated natural metric, cf. Theorem 1.1; however, other values of p and 
more general metrics can arise for other random processes [IG]. 

Definition 2.1. For any A Q X and n> 0, define the entropy number 

en{A) := inf sup(i(a;,A), 

|A|<22” xGA 

and define for p > 0 the 7 p-functional 

7 p(A) := inf sup A„). 

(The approximating sets An C X are not necessarily subsets of A.) 

Fix a set A C X for the remainder of this section. To measure the size of A, we 
introduce a penalty function f : X ^ R_|_ U {+oo} that may in principle be chosen 
arbitrarily. Consider the corresponding optimization problem 

K{t,x) := ini {f{y) + td{x,y)} 

y&X 

for every t > 0 and x G A. We will assume for simplicity that the infimum in 
this optimization problem is attained for every t > 0 and x G A, and denote by 
'Kt{x) any choice of minimizer in the definition of K{t,x). (It is a trivial exercise 
to extend our results to the setting where Trt{x) is a near-minimizer, but such an 
extension will not be needed in the sequel.) We now define for every t > 0 the set 

A := {'n'tix) : x G A}. 

Remark 2.2. In the present formulation, At is not necessarily a subset of A. 
However, it is natural to choose a penalty function / such that A = {x : f{x) < 1}, 
in which case evidently At C A (because /(7rt(x)) < K{t,x) < f{x)). 

The following result lies at the heart of this paper. In the sequel, we write a < 6 
if a < Cb for a universal constant C, and a x & if a < & and b < a. We indicate 
explicitly when the universal constant depends on some parameter in the problem. 

Proposition 2.3. In the setting of this seetion, we have for every a > 0 
7p(^) ^ - sup /(x) + V 2"/Pe„(H^2"/i>). 

where the universal constant depends on p only. 

Proof. We can assume without loss of generality that / is uniformly bounded on A. 
Thus 0 < K{t,x) < /(x) < oo for every x G A and t > 0. Moreover, t i-A- K{t,x) 
is clearly a concave function for every x G A. We now use some basic facts about 
univariate concave functions [9, Chapter I]. First, we note that 

K{t,x) - K{s,x) = ini {f{y) + td{x,y)} - f{TTs{x)) - sd{x,TTs{x)) 

y&X 

< {t — s)d(x, 7rs(x)) 

for all t, s > 0, so that d(x, 7rs(x)) is a supergradient of t K{t, x) at t = s. As a 
bounded concave function is absolutely continuous, we obtain 
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for every T > 0 and x £ A. In particular, we can estimate 

d{x,TTt{x)) dt < f{x) 

for every x G A. We also recall that the derivative of a concave function is nonin¬ 
creasing, so that we can discretize the integral as follows: 

f{x)> / d{x,TTt{x)) dt+2^ / d{x,TTt{x)) dt 

Jo ra>l 

> (1 - 2-i/P)a^ 

n>0 

where we used that 1 1 —>■ d(x,TTt(x)) is nonincreasing in the last step. 

It remains to discretize the minimizers TTt{x). By the definition of entropy num¬ 
bers, we can choose for every n > 0 a set An C X such that |A„| < 2^ and 

sup d{x,An) < 2e„(yla2"/p)- 

/p 

We can therefore estimate 

lp{A) < sup ^ T-I"^d{x, An) 

< sup^2”/Pd(a:,7r„2n/p(x)) -|-^2"/P sup d(7r„2"/!> ^n) 

X^A X^A 

n>0 n>0 

< - sup /(x) -f 2 "/Pe„(A„ 2 "/p), 

which completes the proof. □ 

Remark 2.4. Suppose we replace the penalty / by an equivalent penalty f ^ f. 
Then the first term in the bound of Proposition 2.3 only changes by a universal 
constant, but the second term might change substantially as the definition of the 
sets At is highly nonlinear. This highlights the nontrivial nature of the choice of 
penalty. Similarly, the bound of Theorem 1.2 could potentially give better results if 
we replace B by an equivalent set cB Q B C CB. Note that the same phenomenon 
arises when applying the growth functional machinery of [16]: the growth condition 
is not preserved if we choose an equivalent functional. This appears to be an 
inherent difficulty that arises in the control of chaining functionals. 

2.2. Convexity. While Proposition 2.3 provides a very general chaining principle 
in metric spaces, it is not immediately obvious how to apply this result in any 
given situation. The problem is that the sets At that appear in the previous section 
are defined implicitly as families of solutions to certain optimization problems; in 
the absence of a more explicit characterization, the computation of the entropy 
numbers e„(A,j 2 "/p) can be a challenging problem. To address this problem, we 
specialize our results from this point onwards to the case where the set of interest 
is convex and where the penalty function is chosen to be the associated gauge. The 
convexity assumption makes it possible to obtain a dual formulation of the sets of 
optimizers that is readily amenable to explicit computations. The advantages of 
this formulation will be amply illustrated in the following sections. 
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We now introduce the setting that will be used throughout the remainder of this 
paper. Let {X, || • ||) be a Banach space, and let i? C X be a symmetric compact 
convex set. The metric d that appears in the definitions of the entropy numbers 
e„(i?) and the functionals ^p{B) (cf. Definition 2.1) will always be chosen to be 
dehned by the norm d{x,y) := ||a; —y|l on the underlying Banach space. The gauge 
(Minkowski functional) of B will be denoted || • ||b, that is, 

||x||b := inf{s > 0 : x S sB} 

for X G X. Denote by || • ||g and || • ||* the associated dual gauge and norm, that is, 
Iklls := sup (z,x) = sup(z,x), 112^11*:= sup(z,x) 

||a:||B<l xeB ||x||<l 

for z £ X*. The key point of this section is the following duality result, which 
shows that the minimizers of the Lf-functional in the convex setting define a form 
of projection onto an explicitly defined scale of subsets Bt C B. 

Proposition 2.5. For every t > 0, there is a map nt : B ^ B such that: 

(i) TTt^x) is a minimizer for Peetre’s K-functional for every x G B: 

K{t,x) := inf {||?/||b +t||x- y||} = ||7rt(x)||B + i|la: - 7rt(x)||. 

y&X 

(ii) The set of minimizers 

Bt ■= {xt{x) : X G B} 

can be characterized as 

Bt = {y G B :3z G X* such that {z,y) = ||i/||b, H^Hb < 1, H^H* < 0- 
(Hi) We have 7rt(x) = x for every x G Bt- 

Proof. The result holds trivially for t = 0, so we fix t > 0 in the sequel. 

Step 1. Let Bk '■= conv{BU ^Br^), where Br^ is the closed unit ball in {X, || • ||). 
For completeness, we recall the proof of the elementary fact that K{t,x) = ||x||b^ 
for every x G X, where || • \\bk denotes the gauge of Bk- 

Suppose first that K{t,x) < r, so there exists y G X with ||?/||b + t||x — y\\ < r. 
Then writing x = Axi + /itX 2 with xi = 2//||2 /||b and X 2 = (x — y)/t\\x — y\\ readily 
implies that ||x||bi<: < x. In the converse direction, suppose that ||x||bk < x, so 
that X = Axi + /J,X 2 for some |A| + |/i| < r, xi £ B, X 2 £ pBr.,. Then choosing 
y = Axi in the definition of K{t,x) shows that K{t,x) < r. 

Step 2. We now establish the existence of a minimizer in the definition of 
K{t,x) for every x G X. This is a direct consequence of the previous step and the 
compactness of B. Indeed, as B is compact, the set Bk is closed. Thus K{t,x) = r 
implies x £ rBx, so there exist |A| + |/r| < r and Xi G B, X 2 G jB.^ such that 
X = Axi + 11 X 2 . It follows that y = Axi is a minimizer for K{t, x), as 

K{t,x) < ||Axi||b + t\\lJ-X 2 \\ <r = K{t,x). 

Step 3. Define the set 

B't-.= {yG B : K{t,y) = ||j/||b}- 
We can characterize this set by duality. Indeed, note that 

K(t,y) = sup{(z,j/) : z £ a:*, ||z||b < 1, IkH* < i}, 

where we have used the polar identity Bf^ = B° tBf- Moreover, the supremum 
is attained at some point z £ X* by the Hahn-Banach theorem. Therefore, if 
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y G B^, then there exists z G X* such that {z,y) = ||?/||s, H^HIj < 1, and ||z||* < t. 
Conversely, if y G B is such that a point z satisfying the latter properties exists, 
then IIj/IIb = {z,y) < K{t,y) < ||?/||b so that y G B^. Thus we have 

B't = {y G B :3z G X* such that (z,y) = \\y\\B, ||2 :||b < 1, ||z||* < t}. 

Step 4. Define the map irt : B ^ B as follows. For x G B'^, we set Trt{x) = x. 
For X ^ B'^, we choose nt{x) to be any minimizer in the definition of K{t,x). We 
are going to verify that each of the claims in the statement of the Proposition hold. 

Let us first note that tt* does indeed map B into itself. For x G B[^ this is true 
by construction. For x ^ B[, this is true because ||7rt(a;)||B < K{t,x) < ||a;||_B- 
Moreover, note that when x G B'^, hy construction y = x = 'nt{x) is a minimizer in 
the definition of K{t,x). We have therefore established part (i). 

To prove parts (ii) and (hi), it suffices to show that Bt = B[. That B'^ C Bt is 
obvious from the fact that TTt{x) = x for x G B'^ C B. To establish the converse 
inclusion, we argue as follows. Fix x G B, and choose z G X* such that K{t,x) = 
(z,x), ||z||^ < 1, and ||z||* < t. By the bipolar theorem, we can write 

(z,7rt(a;)) < ||7rt(a;)||B = (z, 7rt(x)) + (z,x - Trt{x)) - t\\x - 7rt(x)|| < (z,7rt(x)). 
This implies that TTt{x) G B[, and thus Bt Q B[. □ 

Remark 2.6. When R is a symmetric convex body in a finite-dimensional Banach 
space, the details of the proof of Proposition 2.5 simplify significantly. It is an 
instructive exercise to give a quick proof in this case using subdifferential calculus. 

The proof of Theorem 1.2 in the introduction now follows trivially. For future 
reference, we formulate the analogous result for 7 p-functionals. 

Corollary 2.7. Let B C {X, || • ||) be a symmetric compact convex set, and define 
Bt := {y G B : 3 z G X* such that {z,y) = ||?/||b, H^Hb < Ikll* < *}. 

Then we have for any a > 0 

n>Q 

where the universal constant depends on p only. 

Proof. This is simply the combined statement of Proposition 2.3, where we choose 
the penalty f{x) = ||x||b and distance d{x,y) = ||x — y||, and Proposition 2.5. □ 

We end this section by emphasizing a remark that was also made in the intro¬ 
duction. Recall that a symmetric convex set B C X is called smooth if for every 
X G X, X ^ 0 there is a unique z G X* so that (z,x) = ||x||b and ||z|||j < 1, cf. [3]. 

Corollary 2.8. Let B be a symmetric convex body in a finite-dimensional Banach 
space {X, II • II), and denote by 9||y||B the subdifferential of || 2 /||b. Then 

Bt = {yGB: mf llzf < t|. 

1 zGailylls J 

In particular, if B is smooth, then 

Bt = {yGB:\My\\Br<t}. 

Proof. It is a classical fact that 5 ||j/||b = {z G X* : {z,y) = ||?/||b, < 1}, so 

that the result follows readily from Proposition 2.5; cf. [9, Chapter VI]. □ 

The explicit nature of Corollary 2.8 is particularly useful in computations. 
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3. Examples 

The aim of this section is to illustrate the utility of Theorem 1.2 in explicit 
computations by investigating some simple but conceptually interesting examples. 
As our goal is to develop insight into the phenomenon described by Theorem 1.2, 
we have avoided unnecessary distractions by restricting attention to situations in 
which existing entropy estimates can be used. 

We write ||x||r := and denote by ei, ..., Cd the standard basis in 

Throughout this section, we work in Euclidean space || • ||) where || • || := || • II 2 . 
The concrete choice of the Euclidean norm is not important for our theory, but 
is made in order to enable explicit computations and is natural in the setting of 
Gaussian processes (as it corresponds to the canonical choice Xt = (t, g) in Theorem 

1.1, where g is a standard Gaussian vector in R'^). Some of the examples developed 
here will be revisited in section 4 in a much more general setting. 

3.1. £q-Ellipsoids. The classical example of a situation where Dudley’s bound fails 
to be sharp is that of ellipsoids in Hilbert space. In this section, we will investigate 
the following more general situation. Given scalars I < q < 00 and bi > b 2 > ■ ■ ■ > 
bd > 0, let H C be the ^g-ellipsoid whose gauge is given by 



We will show that Theorem 1.2 yields the following optimal bound. 

Proposition 3.1. In the setting of this section, we have 

/ d \ (9-i)/<? 

where the universal constant depends on q only. 

Of course, this result can easily be obtained from Theorem 1.1, but our aim is 
to provide a geometric proof that explains why the result is true. 

In order to apply either Dudley’s bound or Theorem 1.2, we will require suitable 
estimates on the entropy numbers of ^^-ellipsoids. The behavior of these entropy 
numbers is investigated in detail in a classic paper by Garl [5] (in the special case 
of .^ 2 -ellipsoids a much more elementary approach can be found in [16, §2.5]). For 
future reference, we record a more general form of the main result of Garl than is 
presently needed. The following can be read off from the proof of [5, Theorem 2]. 
(While the result of Carl is formulated only for r > 1, the proof extends directly to 
the case 0 < r < 1 if we replace [5, Theorem 1] by [8, Proposition 3.2.2].) 

Lemma 3.2 ([5]). Given 0 < r < 00, 1/s > (1/2— l/r)+, 0 < m < 00, and scalars 
Cl > C 2 > ■ ■ ■ > Cd > 0, the £r-ellipsoid C = {a; G R*^ : ||(a;i/ci)||r < 1} satisfies 

d 

(2"(i/«+i/’-i/2)e„(C;))“ X 

n>0 k—1 

where the universal constant depends on r, s, u only. 
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Applying this result with r = q, l/s = l — 1/q, and u = 1 yields 

n>0 k—1 

We therefore see immediately that Dudley’s bound is suboptimal for £q-ellipsoids: 
Dudley’s bound is much larger than 72 (-B), say, when bk = . 

To obtain a sharp bound, we will apply Theorem 1.2. The crux of the matter is 
to control the sets Bt- In the present setting, this is exceedingly simple and gives 
a vivid illustration of where the improvement over Dudley’s bound comes from. 


Proof of Proposition 3.1 . Note that i? is a smooth convex body with 

d\\y\\B 1 \ykV~^ . I ^ 

- = Tg ,, |,„-i Sign yfc). 

Thus Corollary 2.8 gives 

Bt = {yeB: Hylic < ||2 /||r} C 

where 

2q—2l 2) 


\\y\\c = 


E 


\yi\ 


6 ? 


i/D-i) 


Substituting Bt C i )(7 Theorem 1.2 and optimizing over a > 0 yields 

/ \ (9-1)/'? 

V n>0 ) 


The conclusion follows by applying Lemma 3.2 with r = 2q — 2 and s = u = 1. □ 


The key point of the proof of Proposition 3.1 is that each subset Bt of the iq- 
ellipsoid B is contained in a dilation of the much “thinner” £ 2 g- 2 -ellipsoid C: the 
lengths of the semiaxes of C have been raised to the power y/(y — 1) as compared 
to those of B. This is precisely why we obtain the correct powers of bi inside the 
sum in Proposition 3.1. The author sees no obvious way to explain this miracle 
other than that it drops out of the trivial explicit computation performed above. 
However, a deeper understanding of the geometry of the sets Bt for .^g-ellipsoids 
will be obtained in a much more general setting in section 4. 

Remark 3.3. There exist two previous geometric proofs of Proposition 3.1 for 
special values of q. The first, in [11, §15.6], gives a delicate manual construction 
of an equivalent formulation of 72 ( 1 ?) for q = 2. The second, in [16, §4.1], deduces 
the result for 2 < q < 00 from a more general bound for uniformly convex bodies 
that is proved using the growth functional machinery. We will revisit the latter 
idea in section 4, where we will also see that uniform convexity fails to explain the 
behavior of ^g-ellipsoids for 1 < g < 2. That we have obtained a sharp bound for 
every value of q with the same proof therefore hides the fact that £q-ellipsoids can 
have a very different geometry for different values of q. 

Remark 3.4. The universal constant in Proposition 3.1 must necessarily depend 
on q: if this were not the case, then we would obtain 72 ( 1 ?) ^ bi in the limit 
q -I 1 which is easily seen to be false by Theorem 1.1. Unfortunately, the entropy 
estimates provided by Lemma 3.2 are not sufficiently accurate to recover the correct 
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behavior as g I 1. This is not a deficiency of Theorem 1.2, however: the case g = 1 
is of particular interest in its own right and will be investigated in the next section. 


3.2. Octahedra. In this section, we investigate the limiting case g = 1 of the 
example developed in the previous section. That is, given scalars bi > b 2 > ■ ■ ■ > 
bd > 0, we investigate the octahedron B defined by 

B = absconv{&iei : i = 1,... ,d}. 

It is not difficult to show that Dudley’s bound is suboptimal in this setting [IG, 
Exercise 2.2.15]. We will show that Theorem 1.2 yields the following optimal bound. 

Proposition 3.5. In the setting of this section, we have 

72(5) < S := max6iV^log(i + 1). 

i<d 

Of course, this result could easily be obtained from Theorem 1.1, and a rather 
difficult geometric proof using growth functionals can be found in [15, §8]. However, 
the point for our purposes is that this result follows in a completely elementary 
fashion from Theorem 1.2. To apply the latter, let us first identify the sets Bt- 


Lemma 3.6. For any t > 0, we have 

Proof. While || • ||_b is not smooth, we can easily compute its subdifferential: 
d\\y\\B = {z : Zi= sign(?/i)/5i if y, ^ 0, \zi\ < 1/6, if yi = 0}. 


We therefore obtain 


inf 

z^d\\v\\B 


wt=i: 


lyi/o 


and the result follows from Corollary 2.8. 


□ 


Lemma 3.6 shows that the sets Bt are very thin indeed: they consist of sparse 
vectors. Controlling the entropy numbers of such sets is an easy exercise; for each 
fixed sparsity pattern we can discretize using a standard volumetric argument, while 
counting the number of sparsity patterns is a matter of simple combinatorics. 

Lemma 3.7. There is a universal constant c > 0 such that for all n > 0 

^ 2 " 61 . 

Proof. Fix n > 0. As l/6f > log(i + 1)/E^ by definition, we have 

BtCCt-lyeB-.Y, log(f + 1) ly,^o < 

I i=l 

It suffices to control the entropy numbers of the larger set C^ 2 '"/ 2 /^. 

Let us begin with some counting. Denote by I the family of all admissible 
sparsity patterns of y G C^ 2 '^/ 2 /^, that is, I is the family of all I C [d] such that 

^log(* + l)<c22ffi 
iel 





12 


RAMON VAN HANDEL 


Denote by Ife C I the family of all / G I with cardinality |J| = k. Let us bound 
the number of such sets. Setting c := i/log 2/2, we can estimate 

En ' 


-1 < 2^ 


|7|=fc |7|=fe 

The right-hand side can be bounded as follows: 

k 


2 ■ 


\I\^k i£l 


En 


1 


(7 + 1)2 


\I\=k i€l 
where we have used that 
1 


l<ii<i2< — <ik<d i=l ' * ' 


< 


HE 

i=i e>i 


ie 


(* + i) 


1 1 


E 

i>i 


[e + if 


< 




e>i ' 


We have therefore shown that \Ik\ < 2 ^” ^/k\. 

Let e < 5i be a constant to be chosen later on. For every I G I, choose a 
minimal e-net Tj for the Euclidean ball in with radius bi, and denote by T the 
union of all these sets T/. Evidently T is a e-net for /y.- Let us estimate its 
cardinality. A standard volumetric argument yields [2, Corollary 4.1.15] 


\Ti\< 


3hi 

e 


Id 


We can therefore estimate 


i^^i<E 






jlfcl < 2^" 


If we choose e = ( 6 / log 2) 2“"6i, we find that |T| < 2^ which establishes the claim 
whenever 2" > 6 / log2 (as we assumed that e < 6 i in the volumetric estimate). For 
2 " < 6 /log 2 , simply note the trivial bound e„(C'c 2 n/ 2 /s) < diam(i?) < 26i. □ 

With this entropy estimate in hand, the proof of Proposition 3.5 is an immediate 
consequence of Lemma 3.7 and Theorem 1.2 with a = c/E. 


3.3. A counterexample. The aim of this section is to show that Theorem 1.2 does 
not always give sharp results. As the example that we will discuss is a conceptually 
important one, let us briefly consider this example in a broader context. 

A remarkable consequence of Theorem 1.1 is that 72 (conv(T)) x 72 (T) for any 
(non-convex) subset T C of Euclidean space: as the supremum of a linear 
function over a convex set is attained at an extreme point, Theorem 1.1 yields 


72 (T) X E 


svxp{x,g) 

= E 

sup {x,g) 

.xGT 


.aiGconv(T) 


72 (conv(T)) 


(here g denotes a standard Gaussian vector in R^^). It is a long-standing open 
problem to understand the geometric mechanism behind this fundamental fact; cf. 
[16, §2.4]. By using a known device [16, Theorem 2.4.18], one can reduce this 
problem to the following special case: it suffices to give a geometric proof of the 
fact that for any xi,... ,Xn G R'^ such that [jccilj > l|x 2 l| > • • • > ||a:„l[ > 0, we have 

72 ( 73 ) < max j]xil[ \/log(7 + 1), B = absconv{a;i : f = 1,..., n}. 

i<.n 
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We solved this problem in the previous section under the additional assumption 
that the vectors Xi are orthogonal. It is not known, however, how this conclusion 
can be established in the absence of the orthogonality assumption. The results of 
this paper originated in an attempt by the author to understand this issue. We will 
presently illustrate that Theorem 1.2 does not directly resolve this problem. 

The example that we will consider is defined as follows. Fix 0 < e < 1 and let 
u = where 1 is the vector of ones (note that ||u|| = 1). We consider the set 

B = absconv{a;i : i = 1,..., d}, Xj = + eu. 

This is a small perturbation of the example in the previous section where all vertices 
of the simplex have been shifted along the diagonal. One can show as in [16, Exercise 
2.2.15] that 72 (B) X y^logd, while Dudley’s bound is of order (logd)^/^. 

We claim that Theorem 1.2 does not improve on Dudley’s bound in the present 
setting: the sets Bt are not sufficiently small to gain any improvement. This un¬ 
fortunate conclusion is contained in the following lemma. 


Lemma 3.8. We have Bt O conv{xi : j = 1, ..., d} for all t > 1/e. 


Proof. Let V = ** O t)e the square matrix whose columns are the vectors 

Xi. Note that V is invertible, and we have ||x||b = ||y“^x||i. Therefore 

= {V*y^d\\V-^xh 5 {V*)-yign{V-y), 


where sign( 2 ) operates entrywise on a vector z and we set sign(O) := 1. In particular, 
Bt ^ {x G B : sign(E“^x) S tV*Br.^} 


by Corollary 2.8, where Br^ denotes the Euclidean unit ball in 

Now note that if x G convjxi : i = 1,... ,d}, then V~^x has nonnegative entries 
and thus sign(E“^x) = 1. It therefore suffices to show that 1 S tV*Br.^ whenever 
t > 1/e. But this is a simple consequence of the definition of V, as 


tV*v = 1 


for 


u 

t{e + d“i/2) 


and clearly ||z;|| < 1 when t > 1/e. This completes the proof. 


□ 


Let ly ^ be the standard simplex in Lemma 3.8 shows that B* D ^ -t-eu 
whenever t > 1/e. Setting Ua.e = ( 21 og 2 (l/oe))+, we can estimate 

^ 2"/2e„(B„2n/2) > ^ 2"/2e„(A‘^-i) > (logd)^/^ - 

n>0 

for some constant C > 0, where we used that en(A‘^“^) > 2“"'/^Vlogd for n < log d 
[16, Exercise 2.2.15]. We have therefore shown that Theorem 1.2 does not improve 
on Dudley’s bound in this example unless e is polynomially small in d. 


Remark 3.9. Of course, the example described in this section is sufficiently simple 
that we can make some manual adjustments to obtain a sharp geometric construc¬ 
tion. Indeed, we clearly have B C Bi + B 2 where Bi denotes the £i-ball in R'^ and 
B 2 = {oiu : jaj < e} is one-dimensional. Theorem 1.2 gives a sharp generic chaining 
construction for Bi, while a trivial discretization of a suffices to control B 2 . We can 
then glue together the generic chaining constructions for Bi and B 2 by summing 
the corresponding nets. It is not clear, however, how one could construct such a 
decomposition in the general setting described at the beginning of this section. 
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4. Geometry and Entropy Contraction 


In the previous section, we illustrated the utility of Theorem 1.2 in specific 
examples. The computations hinge, however, on a sufficiently explicit description 
of the sets Bt, which may not always be available in more general situations. For 
example, if we consider the examples of the previous section under general norms, 
it may be nontrivial to control the sets Bt directly. It is therefore of interest to 
develop more systematic methods to control the geometry of the sets Bt- 

As a prototype of what one might hope for, let us reconsider the setting of ^q- 
ellipsoids in Hilbert space. Theorem 1.2 bounds 72 ( 1 ?) in terms of the entropy 
numbers of the sets i?*, which we computed explicitly in section 3.1. However, 
Lemma 3.2 suggests that the correct behavior of 72 (H) in this example can also be 
expressed in terms of the entropy numbers of B itself: we easily verify that 


72(H) X 


^ (2'‘/2e„(H))^/^«"'^ 

_ n>0 


(9-l)/9 


The appearance of such a bound is not a coincidence. Talagrand has shown that an 
upper bound of this form holds for any g-convex set H [16, §4.1]: as £q-ellipsoids are 
max( 2 , g)-convex, this provides an alternative explanation for the behavior of Iq- 
ellipsoids in the case 2 < q < 00 . One of the insights to be developed in this section 
is that this fundamental property of g-convex sets is fully explained by Theorem 1.2. 
Roughly speaking, we will show that the g-convexity assumption forces the sets Bt 
to be much smaller than H itself in the sense that e„(H() ^ t^H 9 -i)g^( 5 ) 9 /( 9 -i)^ 
from which the above bound is easily deduced. More generally, this phenomenon 
suggests that the chaining principle for general convex sets given by Theorem 1.2 
can be significantly simplified in the presence of additional geometric structure. 

It turns out that there is nothing special about g-convexity per se, but that the 
entropy contraction phenomenon illustrated above arises from a much more general 
geometric mechanism. We develop a general formulation of this idea in section 4.1. 
We then demonstrate how the requisite structure arises in two distinct settings: 
the case of g-convex sets is developed in section 4.2, while the case of £q-balls in 
Banach spaces with an unconditional basis is developed in section 4.3. 


4.1. A geometric principle. Let {X, || • ||) be a Banach space and let H C A be 
a symmetric compact convex set. The sets Bt are defined as in Theorem 1.2. The 
following geometric principle is the main result of this section. 


Theorem 4.1. Let g > 1 and K > 0 be given constants, and suppose that 
\\y - ^lls < Kt\\y- z\\ for every y,z G Bt, t> 0. 


Then 


7p(H) < 


(2’"/^e„(H))'^''^'^ 

_ n>0 


(9-l)/9 


where the universal constant depends on p, q, and K only. 


Like Theorem 1.2, the message of Theorem 4.1 is that the behavior of 7 p(H) is 
strictly better than would be expected from Dudley’s bound. Unlike Theorem 1.2, 
however, the presence of additional geometric structure allows us to bound 7 p(H) 
only in terms of the entropy numbers of H itself. This bound could therefore be 
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applied even without an explicit description of Bt. Of course, there is no free lunch: 
the assumption of Theorem 4.1 requires us to understand the metric structure of the 
sets Bt- Fortunately, we will see in the sequel that there are interesting situations 
in which this can be accomplished without explicitly computing the sets Bt. 

Remark 4.2. Before we turn to the proof of Theorem 4.1, it is instructive to 
consider the significance of the geometric assumption of Theorem 4.1. Observe 
that we always have, regardless of any assumptions, the following simple fact: 

\\y\\B<t\\y\\ for every y e Rt, t > 0. 

Indeed, if z € X* is as in the definition of Bt, then 

biiB = (^,i/)<ii^ini2/ii<ibii. 

We therefore see that by construction, an element y G Bt with small norm must be 
contained in a small dilation y G t||i/||R of the original set B. The assumption of 
Theorem 4.1 asks that a weaker form of this property hold not only for norms, but 
also for distances: that is, ii y,z G Bt, then y — zG {Kt\\y — z\\y^'^B. This does not 
follow automatically from the corresponding property for norms, as it is typically 
not true that Bt — Bt C cBct for some constant c. Nonetheless, this intuition proves 
to be useful as it will help us identify how the requisite geometric structure arises. 

The main idea behind the proof of Theorem 4.1 is the following observation. 

Lemma 4.3. Suppose that the assumption of Theorem f.l holds. Then 

en+i{Bt) < {Kt en{Bt)y^‘>en{B) for every n>0, t>0. 

Proof. Fix e > 0. By the definition of entropy numbers, we can cover Bt by less 
than 2^ balls of radius (1 + e)e„(Rt). By our assumption, each of these balls (inter¬ 
sected with Bt) is contained in a translate of sB with s < (1 + e)^^‘^{Kt en{Bt))^^'^. 
Therefore, each of these balls can be further covered by less than 2^ balls of radius 
(1 -I- e)se„(R). We have now covered Bt by less than 2^" • 2^" = 2^"^^ balls of 
radius < (1 + e„(Rt))^/'Je„(R). Letting e | 0 completes the proof. □ 

An annoying feature of Lemma 4.3 is that the entropy number on the left-hand 
side is e„+i(Rt) rather than en{Bt). If it were the case that en{Bt) < e„+i(Rt) 
(that is, if we knew a priori that the entropy numbers do not decay too quickly), 
then we could simplify the conclusion of Lemma 4.3 to 

eu{Bt)<t^^^‘^-^'>en{B)‘>/^‘>-^l 

This expression quantifies in the present setting in what sense the sets Bt are much 
smaller than the original set B. From this expression, it would be easy to conclude 
the result of Theorem 4.1: substituting the above bound into Theorem 1.2 yields 

lp{B) < i (2"/Pe„(R))^/'«-'\ 

n>0 

and the conclusion of Theorem 4.1 would follow by optimizing over a > 0. The 
main technical issue in the proof of Theorem 4.1 is to show that its conclusion 
remains valid even when the regularity assumption en{Bt) < en+i(Rt) does not 
hold, which we do by means of a routine dyadic regularization argument. 


16 


RAMON VAN HANDEL 


Proof of Theorem f.l. Fix a constant A > 0 to be chosen at a later stage. For any 
set C, we introduce the regularized entropy numbers d„(C) > en(C) as 

dn(C) := max 

0<k<n 

Using Lemma 4.3, we estimate 
d„(Bt) < max 

0<fc<n+l 

< 2-^”diam(B) + 2^ max 

0<k<n 

< 2-^”diam(B) + max 2^^’=-^'^ek{Bt)^/‘^ek{B) 

0<k<n 

< 2-^”diam(B) + max 

0 k 71 

Therefore, using < a/q + b{q — !)/(?, we obtain 

dn{Bt) < 2-^’"diam(B) + max 

0<fc<n 

In particular, we can crudely bound 


^ 2 ”/^’e„(i?, 2 n/p) < diam(S) ^ 2 "/p2-^’^ + 

n>0 n>0 

1)2'^9/(5—1) ^ ^ e2P‘ql{q—l)p2~^^ E 

n>0 0<fc<n 

In order for the sums to converge we must choose A > q/{q — l)p, so we fix for 
concreteness X = 2q/{q — l)p (the precise value of A does not matter). This yields 

n>0 

< diam(B) + ^ ^ 

n>0 0<fc<n 

= diam(i?) + "EE 

fc>0 n>k 

< diam(B) + ^ (2'=/Pefe(B))‘^^^‘^”^\ 

k>0 


Applying Corollary 2.7 and optimizing over a > 0 yields 


7 p(B) < diam(i?) + 


_ n>0 


(9-l)/9 


It remains to note that diam(i3) < 2eo(i?), so that the first term can be absorbed 
in the second at the expense of the universal constant. □ 


Remark 4.4. An inspection of the proof shows that the universal constant in 
Theorem 4.1 blows up as g ), 1. It would be interesting to understand whether 
there is an analogue of Theorem 4.1 that holds in the limiting case g = 1: that is, 
whether there is a general geometric mechanism that ensures the sharp bound 

-jpiB) X sup2^/^en(^) 

n>0 
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(that the right-hand side is a lower bound on 7 p(i?) is trivial). This situation is 
illustrated by the example of section 3.2: in this case both the assumption and the 
conclusion of Theorem 4.1 hold for q = l (the assumption holds by Remark 4.2 and 
Bt — Bt Q 2R^j, while the conclusion can be deduced from [16, Exercise 2.2.15]), 
but Theorem 4.1 is not sufficiently sharp to capture this example. 

4.2. Uniformly convex sets. In this section, we exhibit an important situation 
where the assumption of Theorem 4.1 can be verified by imposing additional geo¬ 
metric structure on the set B: we show that the assumption holds when B is 
g-convex. This recovers a fundamental result of Talagrand [16, §4.1]. 

Let {X, II • II) be any Banach space, and let R C X be a symmetric convex set. 
As usual, we denote by || • ||b the gauge of B. We recall the following definition. 

Definition 4.5. Let q> 2. A symmetric convex set B is called q-convex if 

B 

for all x,y € B, where ry > 0 is an absolute constant. 


x + y 
2 


We will prove the following result. 


Corollary 4.6 ([16]). Let B be a symmetric eonvex set in a Banach space {X, || • ||), 
and assume that B is q-convex (with constant rj). Then 


lv{B) < 


_ n>0 




where the universal constant depends on p, q, and rj only. 


To connect this result to the explicit computations in section 3.1, we recall that 
fq-ellipsoids are max(2, ( 7 )-convex [3]. This shows that the case 2 < g < oo of 
Proposition 3.1 is in fact a manifestation of the much more general phenomenon 
described by Corollary 4.6: we emphasize that the present result requires no as¬ 
sumption of any kind on the norm || • ||. On the other hand, it is impossible for a 
convex set to be g-convex with q <2 (Hilbert space is maximally convex), so that 
uniform convexity cannot explain the behavior of f,j-ellipsoids for q < 2. We will 
nonetheless see in the next section that the latter case can also be understood as a 
manifestation of the general geometric principle described by Theorem 4.1. 

We prove Corollary 4.6 by verifying the assumption of Theorem 4.1. 


Lemma 4.7. Let B be a q-convex set and t > 0. Then 

\\y- z\\b ^ ^lly- ^11 for every y,z G Bt, 
where the universal eonstant depends on q and rj only. 


We will give two different proofs of this lemma. The first proof is pedestrian, 
but perhaps not very intuitive. The second proof is more intuitive, as it is close in 
spirit to the intuition developed in Remark 4.2; however, this proof requires us to 
use an alternative (but equivalent) formulation of the g-convexity property. 


First proof. By Proposition 2.5, we have TCt{y) = y for y G Bt. Thus 


\\y\\B = inf{||u||B -kt||?/ 

U 


«ll}< 


y + z 


y- z 

2 

B 

2 
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for any y, z € Bt- Similarly, exchanging the role of y and z, we obtain 


1 < 

y + z 


y- z 


27 

B 

27 


7 := 


V U 


B- 


But note that || 2 // 7 ||b < 1 and ||- 2 :/ 7 ||b < 1 by the definition of 7 . Therefore, 
applying the q-convexity assumption to the first term on the right yields 

v9-l 


\\y-zrB< 


T 




■^lly- ^11 


for any y, z € Bt. The proof is completed by noting that 7 < 1. 


□ 


Second proof. An equivalent characterization of the g-convexity property is as fol¬ 
lows [17, Corollary 1]: B is g-convex if and only if 

{jy - jz,y - z) > \\y-z\\‘^g 

for all jy e Jy := {u € X* : {u,y) = || 2 /|||, 1111115 < \\y\\‘jf^} and G J^, where 
the universal constant depends on g, rj only. Note that Jy is none other than the 
subdifferential of the map y i-A- [ji/lla/g (cf. Corollary 2.8), so this characterization 
is rather intuitive: B is g-convex precisely when the map y 1 —>■ \\y\\% exhibits a 
uniform improvement over the usual first-order condition for convexity. 

With this formulation in hand, the lemma follows easily. Let ?/,z G Bt- By 
definition of Bt, we can choose Uy G X* with {uy, y) = ||?/||s, ||wy||B < 1, ||My||* < t. 
Choose Uz G X* analogously. Setting jy = UyWyW^^g^ and jz = lizllzjl^”^ gives 
\\y- 4% ^ {jy-jz,y-z) < \\jy - jz\\*\\y - z\\ < 2t\\y- z\\. 

This completes the proof. □ 

It is now trivial to complete the proof of Corollary 4.6. 

Proof of Corollary f.6. We may as well assume that B is compact: if B is not 
precompact, the right-hand side of the desired inequality is infinite and there is 
nothing to prove; if B is precompact, there is no loss of generality in assuming that 
it is also closed. It remains to apply Theorem 4.1 and Lemma 4.7. □ 

4.3. £q-balls and unconditional bases. We have seen in the previous section 
that uniform convexity cannot explain the behavior of ^^-ellipsoids in Hilbert space 
that was observed in section 3.1. We will presently show that this behavior is 
nonetheless a manifestation of the general geometric principle of Theorem 4.1. It 
will follow immediately that the same behavior persists in a much larger family of 
Banach spaces (but not in a setting as general as for g-convex sets). 

To understand what is going on, let us take inspiration from the second proof of 
Lemma 4.7 (and from Remark 4.2). For any x G X, choose any point jx G X* be 
such that {jx,x) = IkHs and WJxWb < As 

\\y- 4% = {jy-z,y- z) < \\jy-z\\*\\y - z\\, 

the assumption of Theorem 4.1 would follow if we could show that ||jy_ 2 ||* < t 
whenever y,z G Bt- We can always choose |lja;||* < t when x G Bt, but this does 
not in itself yield the desired result: y,z G Bt does not imply y — z G Bt- 

To obtain the desired bound, we must find a relation between jy-z and jy, jz- The 
g-convexity assumption provides the inequality {jy-z,y — z) < {jy—jz,y — z), which 
is particularly convenient for this purpose. However, this is by no means the only 
way to achieve our goal. In the case of £g-ellipsoids, we will use a completely different 
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geometric property: in this case we observe that \jy-z\ ^ \jy \ + \jz\ coordinatewise. 
This simple device allows us to reach the same conclusion as in the q-convex case 
as long as the dual norm || • ||* respects the coordinatewise ordering. 

We proceed to make this idea precise. We first recall the class of Banach spaces 
that possess the desired monotonicity properties [1, §3.1]. 


Definition 4.8. Let (X, || • ||) be a Banach space and let {e„} be a basis for X. 
The basis is said to be unconditional with constant K if 


N 

E 


CLn^r 


< K 


N 




for all TV S N and scalars a„, G R such that \an\ < \bn\ for all n. 


We recall for future reference that if {e„} is an unconditional basis in X with 
constant K, then the biorthogonal sequence {e* } is an unconditional basic sequence 
in X* with the same constant K [1, Proposition 3.2.1]. 


Remark 4.9. The notion of a iV-unconditional basis is often defined in a slightly 
different way than we have done above: a basis is unconditional with constant K if 


N 

^ ^ ^nbn^ 

n—1 


n 


< K 


N 

^ ^ bji^n 
n—1 


for all X G N, G K, and £„ G {—1,+1}, that is, if the norm of J2n=i bn^n is 
approximately invariant to sign changes of the coefficients 5„. The more general 
property of Definition 4.8 is however readily deduced from this alternative definition 
(for example, by choosing random signs Sn such that a„ = £[£„&„]). 


In the following let {X, jj • ]]) be a Banach space and let {e„} be an unconditional 
basis with constant K. Fix 1 < q < oo, and define the £q-ball B C X as follows: 

{ d d 

^^Zie^ : ^ \zr\^ < 1 

(our result will be independent of d, and therefore extends readily to infinite di¬ 
mension). Note that the ^^-ellipsoids considered in section 3.1 correspond to the 
special case where {ci} is the standard basis in and l]a:lp = bfx^. 


Corollary 4.10. In the setting of this section, we have 


IviB) < 


_ n>0 


(i-i)/? 


where the universal constant depends on p, q, and K only. 

Proof. The norm jj • jj on X can be transferred to R'^ by defining || 2 :|j := || -^*6*11 

for z G R'^. There is therefore no loss of generality in assuming that X = R'^ with 
the above norm, that {ci} = {e*} is the standard basis, and that ||a;||B is the £q- 
norm on R'^, as we will do in the sequel for notational simplicity. (We emphasize, 
however, that jj • || is not the Euclidean norm, so that the present setting does not 
reduce to the Euclidean setting considered previously in section 3.1). 
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As ||a;||B is the £q-norm, we can compute 

dxi ~ ||x|||-i 


sign(a;j). 


By Corollary 2.8, we can write 

Bt = {x ^ B |l|a;|'^“^ sign(x)||* < t||a:|||“^}. 

Now note that for any vectors x, j/ S R'^, we have 

\\x-y\\% = {\x-y\‘^~^sign{x-y),x-y) < IHx - sign(a:: - j/)||*||a; - 2 /||. 
Moreover, as |a; — y\‘^~^ < + \y\‘^~^), we have 

lllcc - sign(cc - y)||* < 2(«-2)+X|||a;|«-i + KH 

for all x,y G Bt using the unconditional property of the dual basis {e*}. Thus 

\\x-yrB<2^+(‘^-^^+KH\\x-y\\ 

whenever x^y G and it remains to invoke Theorem 4.1. □ 


We have now given two distinct explanations for the behavior of -^g-ellipsoids 
observed in section 3.1. When g > 2, such sets are g-convex and the result follows 
from the general principle described by Corollary 4.6. In this setting, the result 
remains valid when || ■ || is an arbitrary norm. When g < 2, the observed behavior 
is described by Corollary 4.10, which exploits a more special geometric property of 
£g-balls. In this setting, the result also remains valid for a large class of norms 11*11, 
but we require the additional restriction that the underlying basis is unconditional. 
It appears that these two cases possess a genuinely different geometry, which is 
completely hidden in the statement of Proposition 3.1. 


Acknowledgments, The author would like to thank the anonymous referees for 
helpful comments that improved the presentation of this paper. 
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